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An efficient estimator is constructed for the quadratic covaria- 
tion or integrated covolatility matrix of a multivariate continuous 
martingale based on noisy and non-synchronous observations under 
high-frequency asymptotics. Our approach relies on an asymptot- 
ically equivalent continuous-time observation model where a local 
generalised method of moments in the spectral domain turns out to 
be optimal. Asymptotic semiparametric efficiency is established in 
the Cramer-Rao sense. Main findings are that non-synchronicity of 
observation times has no impact on the asymptotics and that major 
efficiency gains are possible under correlation. Simulations illustrate 
the finite-sample behaviour. 

1. Introduction. The estimation of the quadratic variation or inte- 
grated volatility of a semi-martingale is a key question, both from a theo- 
retical viewpoint as well as for applications, particularly in finance. Here we 
treat the multi-dimensional case, where the quadratic covariation or inte- 
grated covolatility matrix is the quantity of interest. It turns out that the 
richer geometry, e.g. due to non-commuting matrices, generates new effects 
and calls for a deeper mathematical understanding. Covariation estimates 
are particularly important in various financial applications, for instance, as 
inputs in portfolio allocation problems, quantification of risk, hedging or as- 
set pricing. The availability of high-frequency data opens up new ways for 
inference. As the data is typically polluted by observational noise, e.g. by 
microstructure frictions, estimation in these models is far from obvious and 
furnishes unexpected results. 
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We focus on the fundamental statistical model where the ci-dimensional 
discrete-time process 

(So) r/° = X% + ef \ < i < n h 1 < I < d, 

i 

is observed with the ci-dimensional continuous martingale 

x t = x + [ t E 1 / 2 (s)dB s , t€ [0,1], 
Jo 

in terms of a d-dimensional standard Brownian motion B and the squared 
(instantaneous or spot) covolatility matrix 

E(t) = (E fr (*))l£l,r<«l E ^ xd . 

The signal part X is assumed to be independent of the observation noise e. 
The observation errors ni, are assumed to be mu- 

tually independent and centered normal with variances r]f. The observation 
times are given via quantile transformations as t\ = Ff l (i/ni) for some dis- 
tribution functions F[. While this model is certainly an idealisation of many 
real data situations, its precise analysis delivers a profound understanding 
and thus serves as a basis for developing procedures in more complex models. 

Covariation estimation is a core research topic in current financial econo- 
metrics and various approaches exist. Let us mention the quasi-maximum- 
likelihood method by A'it-Sahalia et al. [1], realised kernels by Barndorff- 
Nielsen et al. [3], pre-averaging by Christensen et al. [5] and the local spec- 
tral estimator by Bibinger and Reifi [4]. In contrast to the univariate case, 
however, the asymptotic properties are very involved, difficult to compare 
and a lower efficiency bound was lacking as a benchmark. 

Building on the idea of locally constant approximations, we propose a 
local method of moments (LMM) estimator in the spectral domain which 
is shown to be asymptotically efficient. We perform an asymptotic analysis 
where the sample sizes n\ , . . . , tend to infinity. In Section 2 (strong) 
asymptotic equivalence in Le Cam's sense is established by Theorem 2.4 
with the signal-in- white-noise model 

(St) dY t = X t dt + di&g(H n j(t)) 1 < l < d dW t ,te [0,1], 

where W is a standard d-dimensional Brownian motion independent of B 
and the local noise level is given by 



H n ,i(t) :=7/Kn^/(i))-V3 



(1.1) 
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The imposed regularity condition is that E(t) is the sum of an L 2 -Sobolev 
function of regularity j3 and an L 2 -martingale and the size of /3 accommo- 
dates for asymptotically separating sample sizes (n;)i<;<rf. 

Let us recall that if two sequences of statistical experiments are asymp- 
totically equivalent, then any statistical procedure in one experiment has a 
counterpart in the other experiment with the same asymptotic properties 
for bounded loss functions, see Le Cam and Yang [17] for a thorough in- 
troduction. Our proof is constructive such that the procedure that we shall 
develop for (£1) has a concrete counterpart in (So) with the same asymptotic 
properties. 

A remarkable theoretical consequence of this result is that under noise the 
asynchronicity of the data does not affect the asymptotically efficient pro- 
cedures (it is of smaller asymptotic order). In model (£\) the distribution 
functions F\ only generate a varying local noise level H n j(t), but the shift 
between observation times of different processes does not matter. This is in 
sharp contrast to the noiseless setting where the variance of the Hayashi- 
Yoshida estimator [12] suffers from errors due to asynchronicity, which car- 
ries over to the preaveraged version by Christensen et al. [5] designed for 
the noisy case. 

In Section 3 we consider the continuous-time model (£{) and go over to 
a block-wise constant approximation. Empirical Fourier coefficients yield 
local spectral statistics (Sjk) in the spirit of Reifi [19]. On each block we 
apply locally a generalised method of moments, using a weighted sum of 
the empirical covariance matrices SjkSj k G H dxd and a bias correction. The 
optimal weighting for estimating an entry of the covariation matrix combines 
in general all entries of SjkSj k . The non-commutativity of different Fisher 
information matrices then implies in particular that the volatility estimation 
for one coordinate process gains in efficiency when using data for all 
other potentially correlated processes X^ r \ see Sections 4.2 and 5 for details 
and the improvement with respect to the approach in Bibinger and Reifi [4] . 
Note the contrast with i.i.d. observations of a Gaussian vector where the 
empirical variance of one component is an efficient estimator and using the 
other entries cannot improve the variance estimator unless the correlation 
is known, cf. the classical Example 6.6.4 in Lehmann and Casella [18]. 

In Theorem 3.2 a multivariate central limit theorem (CLT) is provided for 
an oracle LMM (Local Method of Moments) estimator, using the unknown 
optimal weights and an information-type matrix for normalisation. Specify- 
ing to sample sizes of the same order n, Corollary (3.3) yields a CLT with 
rate n 1 / 4 and a covariance structure between matrix entries, which is given 
explicitly by concise matrix algebra. Using pre-estimated weight matrices, a 
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fully adaptive version of the LMM-estimator is obtained, which by Theorem 
3.4 shares the same asymptotic properties as the oracle estimator. 

Another main result of this work is that the asymptotic covariance struc- 
ture of the LMM-estimators is optimal in a semiparametric Cramer-Rao 
sense. In Section 4 a lower bound proof is achieved by a combination of 
space-time transformations and advanced calculus for covariance operators. 
The concrete form of the asymptotic variance is discussed for some key set- 
tings, thus generalising the univariate optimality theory by Gloter and Jacod 
[10] and ReiB [19] to the multivariate case. The discretisation and implemen- 
tation of the estimator for model (£q) is briefly described in Section 5 and 
presented together with some numerical results for a simple toy model and 
a more complex and realistic scenario. The finite sample behaviour of the 
LMM estimators is well predicted by the asymptotic theory (even in cases 
where it does not apply formally) and some comparison with competing 
procedures is provided. 

2. From discrete to continuous-time observations. 

2.1. Setting. First, let us specify different regularity assumptions. For 
functions / : [0, 1] —> R m , m > 1 or also m = d x d for matrix values, we 
introduce the L 2 -Sobolev ball of order a G (0, 1] and radius R > 

H a (R) = {f e H a ([0,l],R m )\\\f\\ Ha <R} where ||/|| H a := max \\fi\\ Ha , 

l<i<.m 

which for matrices means ||/||ir a := maxi<ij<<j We also consider 

Holder spaces C a ([0, 1]) and Besov spaces ([0, 1]) of such functions. 
Canonically, for matrices we use the spectral norm || • || and we set 
H/lloo := supte^u ||/(t)||- 

In order to pursue asymptotic theory, we impose that the deterministic 
samplings in each component can be transferred to an equidistant scheme 
by respective quantile transformations independent of ni, 1 < I < d. 

Assumption 2.1. -(a) Suppose that there exist differentiable distribution 
functions F t G C°([0, 1]), 1 < I < d, with i^(0) = 0, = 1 and F{ > 0, 

such that the observation times in (£q) are generated by t^p = F^ 1 (i/ni), 
< % < n h 1 < I < d. 

We gather all assertions on the instantaneous covolatility matrix function 
£(t), t £ [0, 1], which we shall require at some point. 
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Assumption 2.2. Let £ : [0, 1] — > WL dxd be a possibly random function 
with values in the class of symmetric, positive semi-definite matrices, inde- 
pendent of X and the observational noise, satisfying: 

(i-(3) E G HP ([0,1]) for (3 > 0. 

(ii-a) E = E B + E M with £ B G Bf )OO ([0,l]) /or a > and E M a mairix- 

valued L 2 -martingale. 
(iii-'E) £(i) > E /or a strictly positive definite matrix £ and a// 1 G [0, 1]. 

Let us briefly discuss the different function spaces, see e.g. Cohen [7, 
Section 3.2] for a survey. First, any a-H61der-continuous function lies in the 
L 2 -Sobolev space H a and any .ff°- function lies in the Besov space Bf^, 
where differentiability is measured in an L 1 -sense. The important class of 
bounded variation functions (e.g., modeling jumps in the volatility) lies in 
Bloc, but only in H a for a < 1/2. In particular, part (ii-a), a < 1, covers 
L 2 -semi-martingales by separate bounds on the drift (bounded variation) 
and martingale part. Beyond classical theory in this area is the fact that 
also non-semi-martingales like fractional Brownian motion B H with hurst 
parameter H > 1/2 give rise to feasible volatility functions in the results 
below, using 

B H G C H-e n B H^ for any £ > q from Ciesielski et al. [6]. 

In the sequel the potential randomness of E is often not discussed addi- 
tionally because by independence we can always work conditionally on E. 
Finally, let us mention that we could also weaken the Holder-assumptions 
on F\ , , . . , Fa towards Sobolev or Besov regularity at the cost of tightening 
the assumptions on E. For the sake of clarity this is not pursued here. 

Throughout the article we write Z n = Op(5 n ) and Z n = Op(5 n ) for a 
sequence of random variables Z n and a sequence 5 n , to express that S~ 1 Z n 
is bounded or tends to zero in probability, respectively. Analogously O (or 
equivalently <) and o refer to deterministic sequences. We write Z n x X n if 
Z n = Op(Y n ) and Y n = Op(Z n ) and the same for deterministic quantities. 
Ed denotes the d-dimensional unit matrix and 5 Ptq = t(p = q) equals 1 for 
p = q and otherwise. 

2.2. Continuous-time experiment. 

Definition 2.3. Let £o(("*)i<Kd> R ) with ^ e e (0,1], R > 
0, be the statistical experiment generated by observations from (£q) with 
E G H l3 (R). Analogously, let £i((n/)i</<d, f3, R) be the statistical experiment 
generated by observing (<?i) with the same parameter class. 

As we shall establish next, experiments £q and E\ will be asymptotically 



G 



M. BIBINGER, N. HAUTSCH, P. MALEC & M. REISS 



equivalent as ni — > oo, 1 < / < d, at a comparable speed, denoting 
n min = min m and n max = max n t . 

l<l<d l<l<d 

Theorem 2.4. Grant Assumption 2.1-{(3) on the design. The statis- 
tical experiments £o{{ni)i<i<d-, ft, R) and E\ {in{) i<i<d-> ft, R) are asymptot- 
ically equivalent for any (3 6 (0, 1/2] and R > ; provided 

TT-min ^ OO) n max — C((^rmn) )■ 

More precisely, the Le Cam distance A is o/ order 

d 

1-/3 



A (f ((nj)i<i<d, /3, 22), fi((nj)i<i< d , /3, 22)) = O (r 2 (j2 n i/v^J n. 



in in 



By inclusion, the result also applies for fj > 1/2 when in the remain- 
ing expressions /3 is replaced by min(/3, 1/2). A standard Sobolev smooth- 
ness of S is P almost 1/2 for diffusions with finitely many or absolutely 
summable jumps. In that case, the asymptotic equivalence result holds if 

3 /2 

nmax grows more slowly than n^ in . Theorem 2.4 is proved in the appendix 
in a constructive way by warped linear interpolation, which yields a readily 
implementable procedure, cf. Section 5 below. 

3. Localisation and method of moments. 

3.1. Construction. We partition the interval [0, 1] in blocks [kh, (k + l)h) 
of length h. On each block a parametric MLE for a constant model could be 
sought for. Its numerical determination, however, is difficult and unstable 
due to the non-concavity of the ML objective function and its analysis is 
quite involved. Yet, the likelihood equation leads to spectral statistics whose 
empirical covariances estimate the quadratic covariation matrix. We there- 
fore prefer a localised method of moments (LMM) for these spectral statistics 
where for an adaptive version the theoretically optimal weights are deter- 
mined in a pre-estimation step, in analogy with the classical (multi-step) 
GMM (generalised method of moments) approach by Hansen [11]. 

As motivated in Reifi [19], consider the L 2 ([0, l])-orthonormal system 
(ifjk) and its antiderivatives ($jk)' 

<Pjk(t) = cos (jvr/r 1 (t - kh))t [khi ( k+1)h] (t),j > 1 , (3.1a) 



,„(,) _ it - W0)l,»WO.i > 1. (3.1b) 
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In each component local spectral statistics are denned by 
Sf^irjh- 1 [ ik+1)h <p jk (t)dY^(t),j > l,k = 0,...,h~ 1 -l,l< I < d,(3.2) 

J kh 

from the continuous-time experiment £\. In order to design our estimator, 
we consider a locally constant approximation of the general non-parametric 
model. 

Definition 3.1. Set f h (t) := h~ l f^ h +l)h f{s)ds for t G [kh, (k + l)h), 
k G Mo, a function f on [0,1] and h G (0,1). Assume h^ 1 G IN and let 
Xt = Xq + Jq ^^(s) dB s uizi/i a d- dimensional standard Brownian motion 
B. Define the process 

(£ 2 ) dY t = X? dt + diag (y/lPn,i,h(tj) i<i<d dW t ,t€ [0, 1] , 

where W is a standard Brownian motion independent of B and with noise 
level (1.1). The statistical model generated by the observations from (£2) for 
S G HP(R) is denoted by £2{( n i)i<i<d> h,p,R). 

In experiment £2 we thus observe a process with a covolatility matrix 
which is constant on each block [kh, (k + l)h), k = 0, 1, ... , h~ l — 1, and cor- 
rupted by noise of block-wise constant magnitude. Our approach is founded 
on the idea that for small block sizes h and sufficient regularity this piecewise 
constant approximation is close to £\. 

A basic ingredient to derive the covariance structure are the joint moments 
for a (f-dimensional random vector X ~ N (0, Q). For (1, r,p, q) G {1, . . . , d} 4 
we have by Isserlis [13]: 

E [iWlWlWjW] = Q lr Q pq + Q lp Q rq + Qi q Q rp , (3.3) 

in particular Var((X«) 2 ) = 2Q 2 U , Var(X^X^) = Q U Q PP + Q\. 

The LMM estimator is built from the data in experiment £\, but designed 
for the block-wise parametric model £2- In £2, the L 2 -orthogonality of (tpjk) 
as well as that of ($jk) imply (cf. ReiB [19] in the scalar case) 

Sjk ~ N(0, Cjk) independent for all (j,k) (3-4) 

with covariance matrix 

C jk = Y> kh Wj 2 h- 2 diag(^) 2 , Y, kh = E h (kh), H% = (W^kh)) 1 / 2 . (3.5) 
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In the multivariate central limit theorem we are facing covariances between 
entries of the covariation matrix estimator, which we shall formalise by in- 
terpreting matrices as vectors: for a matrix A 6 M, dxd we consider the vector 
of its entries 

vec(A) := (An, A 21 , . . . , A dl , A 12 , A 22 , . . . , A d2 , . . . , A d{d _ x) , A dd ) T E R d2 . 

The natural estimator for vec{Cj k ) is the empirical covariance vec(SjkSj k ). 

We employ Kronecker (tensor) product calculus, where A ® B € H d2xd2 for 
A, £> G R dX(i is given by 

(A (8) £?)p(d-i)+ 9 y(d-l)+ ? ' = A pp iB qq i, p,q,p',q' = l,...,d. 

We evaluate the covariance matrix of vec(Sj k Sj k ) in model £ 2 by introducing 

Z = COV{vec{ZZ T )) for Z ~ N(0, £ d ) (3.6) 

and applying the rule vec(ABC) = (C T (8) A)-uec(-B), see e.g. Fackler [9]: 

CO\e 2 (vec{S jk S] k )) = <CO\ {vec{c}[ 2 Z Z T c)' 2 )) = (C jk ® C ife )Z. (3.7) 

We have used the commutativity of Z with (Cjfc ® Cjk) 1 / 2 = {Cjk &> Cj[ 2 ), 
which is easily checked using the actual form of Z derived from (3.3) 

Zp(d-l)+q,pi(d-l)+q> = (1 + Sp,q)fi{p,q},{p',q'},P, Q,P , Q = 1) • • • , d, 

or the equivalent property Zvec(A) = vec(A + A T ) for all A G R rfx<1 . Let us 
further introduce the Fisher information-type matrices 

oo 

ijk = cj k x ® c~ k \ h = J2 hk, j > i, * = o, . . . , hr 1 - 1. 

3=1 

(n) 

Our local method of moments estimator with oracle weig hts LMMi r ; uses 
that on each block a natural second moment estimator of T, kh is given as a 
convex combination of the bias-corrected empirical covariances: 

LMM&>:= J2 h J2 W ^ec(s jk Sj k -^di ag ((H^) 2 ) 1<l<d ). (3.8) 

k=0 j=l 

The optimal weight matrices Wj k in the oracle case are obtained as 

W jk :=I k l I jk £fi d2rf . (3.9) 

Note that Cj k , Ij k , I k and Wj k all depend on (ni)i<i< d and h, which is omit- 
ted in the notation. Finally, observe that (3.5) and J2j Wj k = E d 2 imply 

(n) 

that LMMor is unbiased under £ 2 . 
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3.2. Asymptotic properties of the estimators. We formulate the main re- 
sult that the oracle estimator (3.8) and also a fully adaptive version for the 
integrated volatility matrix satisfy central limit theorems. 

Theorem 3.2. Let Assumptions 2.1-(a), 2.2(ii-a) and 2.2(iii-E) with 
a > 1/2 hold true for observations from model S\. The oracle estimator 
(3.8) yields a consistent estimator for vec{f^ £(s) ds) as n m i n — > oo and 
h = hon~l/n with h -> oo. Moreover, if n max = o(n^ in ) and h = o(nmdx), 
then a multivariate central limit theorem holds: 

ill 2 (hMM^ -vec (^J\(s) ds^J -^N(0,Z) in 8 1 (3.10) 

with Z from (3.6) and I" 1 = Efc=o _1 1)2 T k 

We give a typical illustration with convergence rate n l J^ n and asymptotic 
covariance matrix. 

Corollary 3.3. Under the assumptions of Theorem 3.2 suppose 
n m in/n p — > Up 6 (0,1] for p = l,...,d and introduce H(t) = 
diag^pi/p 72 ^)- 1 ^ G Rrfxd and S V2 . = HiH^m.- 1 ) 1 ^. Then 

n]^ n ^LMM^ -vec S(s) ds) j -A N (O,!^) in ^ (3.11) 



with 



I" 1 



2 / (S ® S^ /2 + £* /2 ® S)(t) dt. 
Jo 

In particular, the entries satisfy for p, q = 1, . . . , d 

f (LMMW) p(d _ 1)+9 - jf 1 S M ( S ) dsj ^> (3.12) 

N ^0, 2(1 + 5 Pi ,) j\z pp (z]{ 2 ) qo + X w (4f ) pp + 2E OT (E}/ 2 ) OT )(i) dA . 

The variance (3.12) will coincide with the lower bound obtained in Section 
4 below. The local noise level in H{t) depends on the observational noise 
level r] p and the local sample size v ~ l F!p{t), p = 1, . . . , d, after normalisation 
by n m i n . It is easy to see that in the case n m i n /n p — > the asymptotic 
variance vanishes for all entries (p,q), q = 1, . . . , d. In general, also all other 
asymptotic variances are then reduced because the estimator profits from 
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correlation. In Section 4.2 the effect of T~L on is discussed and illustrated 
in Figure 1 . We infer the structure of the asymptotic covariance matrix using 
block- wise diagonalisation in Appendix B. 

For a feasible estimator the optimal weight matrices Wj k = Wj(T, kh ) and 
the information-type matrices Ij k = Ij(T, kh ) are estimated in a preliminary 
step from the same data. To reduce variability in the estimate, a coarser 
grid of r _1 equidistant intervals, r/h G M is employed for Wj k . As derived 
in Bibinger and ReiB [4] for supremum norm loss and extended to L 1 -loss 
and Besov regularity using the L 1 -modulus of continuity as in the case of 
wavelet estimators (Cor. 3.3.1 in Cohen [7]), a preliminary estimator £(i) 
of the instantaneous covolatility matrix £(i) exists with 

\\t-n\ L ,=0 P (n^^) (3.13) 

for £ G Bf oqQO, 1]). For k with kh G [mr, (m + l)r) we set 

W jk = Wj{t mr ), I jk = Ij(t kh ) with t mr = IV (mr), t kh = \{kh). 

The quadratic covariation matrix estimator with adaptive weights is then 
given by 

-1 oo 



r" ! .- fc.cT "-/"',:;.,,,/, • jt '■•''< 

fc=0 3=1 

We estimate the total covariance matrix via 

h- 1 -! 



E ^(E4)~ ■ (3-15) 



-l _ 

fc = jr' = l 

For j — > oo the weights Wj(£) and the matrices -fj(X) decay like j~ 4 in 
norm, compare Lemma C.l below, such that in practice a finite sum over 
frequencies j suffices. By a tight bound on the derivatives of £ i— > Wj(T,) we 
show in Appendix C.4 the following general result. 

Theorem 3.4. Suppose £ G Bf >oo ([0, 1]) /or a G (1/2,1] satisfying 
a/ (2a + 1) > log(n ma:r )/log(n m j n ) — 1. Choose h,r — >• suc/i i/iat /io = 

toml ~ log(n min ) and n^*l} 2a+l) < r < (nmin/nm^) 1 / 2 , ^~ 1 ,r~ 1 ,r//i G 
M. // i/ie p«Zoi estimator £ satisfies (3.13), i/ien under the conditions of 
Theorem 3.2 the adaptive estimator (3.14) satisfies 

Il/ 2 (LMM^ -vec(j\(s)ds^ -±>N(0,Z), (3.16) 
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with I n from (3.15). 

Moreover, Corollary 3.3 applies equally to the adaptive estimator (3.14). 

Since the estimated I n appears in the CLT, we have obtained a feasible 
limit theorem and (asymptotic) inference statements are immediate. 

Some assumptions of Theorem 3.4 are tighter than for the oracle estima- 
tor. To some extent this is for the sake of clarity. Here, we have restricted 
Assumption 2.2(ii-a) to the Besov-regular part. A generalisation of the pilot 
estimator to martingales seems feasible, but is non-standard and might re- 
quire additional conditions. We have also proposed a rather concrete choice 
of h and r, less is used in the proof, see e.g. (C.3) below. 

The lower bound for a in terms of the sample-size ratio n max jn m i n is due 
to bounding norms of (estimated) information-type matrices separately. For 
a = 1 (bounded variation case) the restriction imposes n max to be somewhat 
smaller than n 4 J^ n . By the Sobolev embedding B\ ^ C H@ for all /3 < 1/2 
the restriction n max = c(n^^) from Theorem 2.4 is clearly also satisfied in 
this case. 

It is not clear whether a more elaborate analysis can avoid these restric- 
tions. Still, to the best of our knowledge, a feasible CLT for asymptotically 
separating sample sizes has not been obtained before. The inclusion of pos- 
sible jumps in S by measuring regularity only in the Besov scale is already 
in the scalar case a significant improvement over Reifi [19]. 

4. Semiparametric efficiency. 

4.1. Semiparametric Cramer-Rao bound. We shall derive an efficiency 
bound for the following basic case of observation model (£i): 



We assume So(t) and H(i) to be known symmetric matrices, 0(t) orthogonal 
matrices, A(t) = diag(Ai(i), . . . , Xd(t)) diagonal and consider e S [—1, 1] as 
unknown parameter. Furthermore, we require Assumption 2.2(m-E) for all 
S. Finally, we impose throughout this section the regularity assumption that 
the matrix functions 0(t),M(t),A(t) are continuously differentiable. 

The key idea is to transform the observation of dYt in such a manner that 
the white noise part remains invariant in law while for the central parameter 




where 



S(t) = E (t) + eH(t), S (t) 1/2 = 0(t) T A(t)0(t). 



(4.2) 
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£(t) = So(t) the process X is transformed to a process with independent co- 
ordinates and constant volatility. It turns out that this can only be achieved 
at the cost of an additional drift in the signal. The construction first ro- 
tates the observations via 0(t), which diagonalises So(t), and then applies a 
coordinate-wise time-transformation, corrected by a multiplication term to 
ensure L 2 -isometry such that the white noise remains law- invariant. 
We introduce the coordinate-wise time changes by 

nit) = and ( T r9)(t) ■= Oft(n(0), • • • ,g d (r d (t))) T 

J Xi{s)ds 

for g = (gi, . . . , g^) : R — > R rf . Moreover, we set 

A:= [ A(s)ds, R'(t):=A- 1 A(t) = diag(r[(t),...,r' d (t)). 



Lemma 4.1. By transforming dY = T r 1 Ai^ R ,^~i/2 dY , the observation 
model (4.1), (4.2) is equivalent to observing 

dY(t) = S(t) dt + ~—dW(t) with (4.3) 
\/n 



s(t) = T r v i ((RT 1 ( I ((R'noy(s)x( S )ds+ jfc^^-sowdxc*)))^) 

for t G [0,1]. At e = the observation dY{t) reduces to 

( C T~ 1 ((R'y 1 ((R')~ 1 / 2 0)'X)(s) ds + AB(t)] dt H — ]=dW(t). (4.4) 
v Jo ' vn 



Here W and B are Brownian motions obtained from W and B, respectively, 
via rotation and time shift, as defined in (D.l) below. 

If we may forget in (4.4) the first term, which is a drift term with respect to 
the martingale part AB(t), then the central observation is indeed a constant 
volatility model in white noise. The lemma is proved in Appendix D.l. 

Let us introduce the multiplication operator M.A9 '■= Ag and the inte- 
gration operator 

Ig{t) = — I g(s) ds and its adjoint I*g(t) = — g(s) ds. 
Jt Jo 

The covariance operator C U:£ on L 2 ([0, l],R rf ) obtained from observing the 
differential in (4.3) is then given by 

C n:£ = T;M {R ,y/2 rM^ +£ uIM T {R ,y/2T r + n^ 1 Id . 
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The covariance operator Q n ,e when omitting the drift part is given by 
Qn,e = Qn,o + eI*T*M M T r I with M(t) := ((R'y 1/2 OMO T (R')- 1/2 )(t) 

where for e = the one-dimensional Brownian motion covariance operator 
Cbm appears 

Q nfl = diag(XiiC B M + n' 1 Id)i<i< d , C B M = 1*1- 

We set Co = (CW )£ - C n ,o)/e and Q = (Q n , £ - Qn,o)/e- 

Standard Fisher information calculations for the finite-dimensional Gaus- 
sian scale model, e.g. [18, Chapter 6.6], transfer one-to-one to the infinite- 
dimensional case of observing N(0, Q n .e) an d yield as Fisher information for 
the parameter e at e = the value 

In = ^WQnJ QoQnfi WhS > 

— 1/2 —1/2 

because Q n Q n ,eQ n o ^ s differentiable at e = in Hilbert-Schmidt norm. 
In Appendix D.2 we show by Hilbert-Schmidt calculus, the Feldman-Hajek 
Theorem and the Girsanov Theorem that the models with and without drift 
do not separate: 

Lemma 4.2. We have 

limsup||Q^o /2 Q Qn,o /2 ~ C nfl 2 CoC~n /2 \\HS < °o. 

Lemma 4.2 implies that the drift only contributes the negligible order 
0(1) = o(y/n) to the Fisher information. By identifying the hardest para- 
metric subproblem for observations N(0, Q n ,e) we thus establish in Appendix 
D.3 a semiparametric Cramer- Rao bound for estimating any linear func- 
tional of the covolatility matrix. Further classical asymptotic statements 
like the local asymptotic minimax theorem would require the LAN-property 
of the parametric subproblem. 

Theorem 4.3. For a continuous matrix-valued function A : [0, 1] — > 
j^dxd cons id er fog estimation of 

:= [\A(t),Z(t)) HS dt = f 1 V A i3 {t)^ tJ {t)dt G R. (4.5) 
Jo Jo ij=l 

Then a hardest parametric subproblem in model (4.1), (4.2) is obtained for 
the perturbation of So by 

H*(i) = (E (A + A T )Y,l /2 + ^ /2 (A + A T )£ )(i). 
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There any estimator i9 n of which is asymptotically unbiased in the sense 
g^OEtf[i? n ] — $) — > 0, satisfies as n — > oo 

Var e=0 (i9„) > (2 + ^_ (1)) f {{IIq®^ 2 +T}J 2 ®T lQ )Zvec{A),Zvec{A)){t)dt. 
V n Jo 

4.2. Discussion. The Cramer-Rao bound of Theorem 4.3 coincides with 
the asymptotic variance obtained in Corollary 3.3 in the case %{t) = E&. 
For different, but constant in time noise levels T~L, we can apply a rescaling 
argument and replace in the lower bound model E(i) by % _1 E(i)% _1 and 
A(t) by HA^H. This gives the more general Cramer- Rao bound 

Var £=0 (i9„) > (2+ ^_ (1)) f "((Eo®E^ + E^®E )^ec(,4), Zvec(A))(t)dt 
V n Jo 

with = UiU-^oH- 1 ) 1 ^, as in Corollary 3.3. If W(t) depends on 

t, rescaling generates another drift term, but if it varies smoothly in t, we 
expect to obtain again a lower bound that matches the asymptotic variance 
of our estimator. Let us discuss the efficient asymptotic variance AVAR 
further, concentrating on the homogeneous case H(t) = Ed- 
The efficient asymptotic variance for estimating Jq 1 T> pp (t) dt is 

AVAR ( jf 1 E pp (t) dt) = 8 jT 1 EjpCtXE 1 / 2 ^))^ dt. 

For the asymptotic variance of estimating E p9 (t) dt we obtain 

2 J ((E 1 / 2 )^ + (E 1 / 2 ) OT E PP + 2(E 1 / 2 ) M S M ) (i) dt. 

Let us calculate specific examples. First, in the case d = 1, E = a 2 this 
simplifies to 

AVAR(^ a 2 (t)dt) =8 <r 3 (t)(ft, 

which agrees with the efficiency bound in ReiB [19]. For p ^ q in the inde- 
pendent case E = diag(<7 2 )i< p <d we find 

AVAR ( J Yupq (t) dt) = 2 J (p*a q + o- p a 2 q ){t) dt. 

In the case d = 2 with spot volatilities cr 2 (t) = erf (£) = cr 2 (t) and general 
correlation /o(i), i.e. 012 (i) = (p<xi<72)(i), we obtain 

AVAR ( jf cr 2 (t) dt) = 4 jf a 3 (t) (y/l + p(t) + yjl - p(tj) dt, 
AVAR ( J a 12 (t) dt) = 2 J a 3 (t)((l + p(t)f 2 + (1 - p{t)f' 2 ) dt. 
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FlG 1. Asymptotic variances of LMM for volatility o~\ (left) and covolatility <xi2 (right) 
plotted against correlation p and noise level r\2 (constant in time). 



With time-constant parameters these bounds decay for a\ (resp. grow for 
(T12) in \p\ from 8a 3 (resp. 4<r 3 ) at p = to 4\/2o" 3 at |p| = 1 for both cases. 

All the preceding examples can be worked out for different noise levels in 
W, let us just highlight the bound for AVAR(/ Q 1 T, pq (t)dt): 




which matches the asymptotic variances obtained in Corollary 3.3. In general 
all noise levels enter for a fixed entry (p, q) via the matrix root (T-L~ l YfH~ 1 ) 1 1 2 
which only in the case of a diagonal covolatility matrix E = diag(<7p) p de- 
couples as diag(ry~ 1 cr p )p and where the bound simplifies to 

P Q ■ 2 / {VpVpVq + VqVgVp)^) dt; p = q : 8 f (r} p af)(t) dt. 



Figure 1 illustrates the general dependence of the asymptotic variance on 
the noise level via E% in the case d = 2. The two volatilities are a\ = Ci = 1, 
the covolatility is 012 = p (constant in time) and the first noise level is r)\ = 1. 
The left plot shows the asymptotic variance of the estimator of a\ as a 
function of p and 7/2- We see the significant gain of using observations from 
the other process for larger values of p. If the noise level 772 for the second 
process is small, then the asymptotic variance can even approach zero. The 
plot on the right shows the same dependence for estimating the covolatility 
o\2- For comparable size of 772 and 771 the asymptotic variance increases 
in p which is explained by the fact that also the value to be estimated 
increases. For small values of 7721 however, the efficiency gain by exploiting 
the correlation prevails. 
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For larger dimension d the variance can even be of order 0(1/Vd); in the 
concrete case where all volatilities and noise levels equal 1, the asymptotic 
variance for estimating a\ can be reduced from 8 (using only observations 
from the first component or if E is diagonal) down to 8/ yd (in case of perfect 
correlation). 

We can also investigate the estimation of the entire quadratic covariation 
matrix J S(i) dt and measure its loss with respect to the squared d x d- 
Hilbert-Schmidt norm. Summing up the variances for each entry, we obtain 
the asymptotic risk 



4 

h j 



trace(S 1/2 ) trace(S) + trace(S 3/2 )) (t) dt. 



This can be compared with the corresponding Hilbert-Schmidt norm error 
^(trace(S) 2 + trace(S 2 )) for the empirical covariance matrix in an i.i.d. 
Gaussian N(0, S)-setting. 

For nonlinear functionals the Cramer-Rao bound is obtained by lin- 
earisation. Consider the prominent example of estimating power varia- 
tions of the form Jq 1 (T lpq ) p / 2 (t)dt for some p > (p = 4 yields the so- 
called quarticity). Linearisation of the perturbation yields YfJ q 2 = (So)pq 2 + 
|(S )pq 2_1 ei/pg + o(e) provided (S ) pg > 0. We thus consider A(t) = 
^(£o(£))pg 2 1 (S(p,q),(p',q'))p',q' an d obtain the lower bound 

2 1 

Y J Q ( S o)p7 2 ((£c/ 2 )pp(£o)qg + ( S c/ 2 W S o)pp + 2 ( S /2 W S o)pg) (*) dt. 

For p = q this reduces to 2p 2 Jq 1 ((T,^ 2 )pp(T l o)^p 1 )(t)dt, which is independent 
of So only if p = 1/2 and So is diagonal. In that case asymptotic equivalence 
with a homoskedastic Gaussian shift in terms of the mean function S(t) 1 / 4 
holds, derived by independence of coordinates from Reifi [19], but for non- 
diagonal So a variance-stabilising transform or even an equivalence result is 
not apparent. 

5. Implementation and numerical results. 

5.1. Discrete-time estimator. The construction to transfer discrete-time 
to continuous-time observations in the proof of Theorem 2.4 paves the way 
to the discrete approximation of the local spectral statistics (3.2). Using the 
interpolated process and integration by parts yields 

J v=l Jt v-\ tp — t v _ l 
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Fig 2. Variances of estimators of a\ (left) and 012 (right) in time-constant scenario 
(n = 30,000;. 



Hence, for discrete-time observations from (£0) we use the local spectral 
statistics 

ni Al) , .(0 
u=l 

The noise terms in (3.5) translate from E\ to £q via substituting 
& +X) \Fl{s))-Us by Y Jy , kh < t m< (k+1)h { t< i ) -4-i) 2 - The discrete sum 
times can be understood as a block- wise quadratic variation of time in 
the spirit of Zhang et al. [20] . The bias is discretised analogously. 

For the adaptive estimator we are in need of local estimates of niF[, £ 
and estimators for rjf, 1 < I < d. It is well known how to estimate noise 
variances with faster y^-rates, see e.g. Zhang et al. [20]. Local observation 
densities can be estimated with block-wise quadratic variation of time as 
above, which then yield estimators H¥\ of H n [ around time kh. Uniformly 
consistent estimators for H(t),t G [0,1], are feasible, e. g. averaging spectral 
statistics for j = 1, . . . , J over a set JCt of K adjacent blocks containing t: 



£(i) = K- 1 J2 J' 1 E i S ^ S Jk - « 2 fh~ 2 dkg((£i$)i ) 
fee/Ct j=i 



(5.2) 



We refer to Bibinger and Reifi [4] for details on the non-parametric pilot 
estimator with J = 1. 



5.2. Simulations. We examine the finite-sample properties of the LMM 
for the case al = 2 in two scenarios. First, we compare the finite-sample vari- 
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ance with the asymptotic variances from Sections 3 and 4, i. e. for a para- 
metric setup with rji = 77! = 0.1, o\ = 02 = 1 and constant correlation p. 
We simulate n\ = ni = 30,000 synchronous observations on [0, 1]. For esti- 
mating a\ and o\2 = p, Figure 2 displays the rescaled Monte-Carlo variance 
based on 20, 000 replications of the oracle and adaptive LMM (LMM or and 
LMM a d), as well as the adaptive spectral estimator (SPEC a d) by Bibinger 
and Reifi [4], which relies on the same spectral approach, but uses only scalar 
weighting instead of the full information matrix approach. 

In practice the pilot estimator from (5.2) for J not too large performed 
well. As configuration we use h~ l = 10, J = 30 and K = 8, which turned 
out to be an accurate choice, but the estimators are reasonably robust to 
alternative input choices. For the LMM of a\, we observe the already familiar 
variance reduction effect associated with a growing signal correlation p, while 
the simulation-based variances of both LMM or and LMM a ^ are close to their 
theoretical asymptotic counterpart (Theor). The results for u\i underline 
the precision gains compared to SPEC a d with univariate weights when p 
increases. 

Next, we consider a complex and realistic stochastic volatility setting that 
relies on an extension of the widely-used Heston model, as e. g. employed by 
A'it-Sahalia et al. [1], accounting for both leverage effects and an intraday 
seasonality of volatility. The signal process for I = 1,2 evolves as 

dxf ] = Vl (t) <rj(t) dZf\ daf(t) = a x ( Ml - af(t)) dt + ft <n(t) dV® , 

where zf and V® are standard Brownian motions with dZ\ dZ?' = pdt 
and dZ®dv} m ^ = 5^ m ~jidt. (pi(t) is a non-stochastic seasonal factor with 
L ifi(t) dt = 1. The unit time interval can represent one trading day, e.g. 
6.5 hours or 23,400 seconds at NYSE. 

We initialise the variance process crf(t) by sampling from its stationary 
distribution T(2 ai pi/ijjf, ipf / (2a;)) and vary the value of the instantaneous 
signal correlation p, while setting (/^, ai, ipi, 7;) = (1,6,0.3,-0.3), I = 1,2, 
which under the stationary distribution, implies E [f <Pi(i) crf (t) dt) = 1. 
The seasonal factor <pi (t) is specified in terms of intraday volatility functions 
estimated for S&P 500 equity data by the procedure in Anderson and Boller- 
slev [2]. <pi(t) and (f2(t) are based on cross-sectional averages of the 50 most 
and 50 least liquid stocks, respectively, which yields a pronounced L-shape 
in both cases (see Figure 3). We add noise processes that are i.i.d. N(0, rjf) 
and mutually independent with rji = 0.1(E[f <pf(t) <jf(t) dil) 1 / 4 , computed 
under the stationary distribution of crf(t). Finally, asynchronicity effects are 
introduced by drawing observation times tf\ 1 < i < ni, I = 1,2, from two 
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Fig 3. N on- stochastic volatility seasonality factors (left) and RMSE for estimators of 
J Vi(t) af(t) dt (right) in stochastic volatility scenario. 



independent Poisson processes with intensities Ai = 1 and A2 = 2/3 such 
that, on average, n\ = 23, 400 and ri2 = 15, 600. 

As a representative example, Figure 3 depicts the root mean-squared er- 
rors (RMSEs) based on 40, 000 replications of the following estimators of 
Jo Pi{t) a i(t) dt- the oracle and adaptive LMM using h- 1 =20, J = 15 and 
K = 8, the quasi-maximum likelihood (QML) estimator by Ai't-Sahalia et. 
al. [1] as well as an oracle version of the widely-used multivariate realised 
kernel (MRK or ) by Barndorff- Nielsen et al. [3]. For the latter, we employ the 
average univariate mean-squared error optimal bandwidth based on the true 
value of L tpf(t) af(t) dt, I = 1,2. Further, we include the theoretical vari- 
ance from the asymptotic theory (Theor) , which is computed as the variance 
(3.12) averaged across all replications. 

Three major results emerge. First, the LMM offers considerable precision 
gains when compared to both benchmarks. Second, a rising instantaneous 
signal correlation p is associated with a declining RMSE of the LMM, which 
is due to the decreasing variance, and thus confirms the findings from Section 
3 in a realistic setting. Finally, the adaptive LMM closely tracks its oracle 
counterpart. 

In summary, the simulation results show that the estimator has promising 
properties even in settings which are more general than those assumed in 
(S\), allowing, for instance, for random observation times, stochastic intra- 
day volatility as well as leverage effects. Even if the latter effects are not yet 
covered by our theory, the proposed estimator seems to be quite robust to 
deviations from the idealised setting. 
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APPENDIX A: FROM DISCRETE TO CONTINUOUS EXPERIMENTS 

Proof of Theorem 2.4. To establish Le Cam equivalence, we give a 
constructive proof to transfer observations in So to the continuous-time 
model £1 and the other way round. We bound the Le Cam distance by es- 
timates for the squared Hellinger distance between Gaussian measures and 
refer to Section A.l in [19] for information on Hellinger distances between 
Gaussian measures and bounds with the Hilbert-Schmidt norm. The crucial 
difference here is that linear interpolation is carried out for non-synchronous 
irregular observation schemes. Consider the linear B-splines or hat functions 



bi,n (t) = Irfci i+i](t) min ( 1 + n 1 1 



n ' n 



, l-n[t 



Define b\(t) := bi jni (Fi(t)),l < i < ni,l < I < d, which are warped spline 

functions satisfying ^(i^ ) = <^i,i 2 - A centered Gaussian process Y is de- 
rived from linearly interpolating each component of Y: 



m ni rii 



(A.l) 



i=l 



i=l 



i=l 



The covariance matrix function E [Yjl^Tj of the interpolated process Y is 
determined by 



E 



ni n r ni 

y t {l) Y s {r) ] = E E a iM l) a tpn (t)K(s) + m? E 



ni n r 

EE 

i=l v=l 

with A(t) = (ai r (t))i >r= i r .. ttl 



8=1 



S(s) ds . 

For any g = (gl 1 ), . . .,g w ) T € L 2 ([0, l],R d ) we have 



u=l 

ni n r 



d ni 



l\2„2 
i) Vl ■ 



l, r =l i=l v=\ 



1=1 i=l 



The sum of the addends induced by the observation noise in diagonal terms 
is bounded from above by 



E# /Vv^r^Elk ^ 



1=1 



n, 



1=1 



2 

1? 
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since by virtue of < J2i h,n < lj / ^i,n = 1/n and Jensen's inequality: 



ni^Jo 

71; Jn ™Z JO -f; 



On the other hand, we have 

E[(fl, diag(ff n ,0< dWO] = ^ || <? W #n,Z Ilia 



Z=l 



for a d-dimensional standard Brownian motion W. Consequently, a process 
Y with continuous-time white noise and the same signal part as Y can be 
obtained by adding uninformative noise. Introduce the process 

ni 

dY=(j2X t mb[(t)) dt + & & g(H n>l (t)) 1 < l < d dW t , (A.2) 

^ — : i ' KKd 
1=1 

and its associated covariance operator C : L 2 — > L 2 , given by 

d ni n r 

KKd 



a m n r 

°9(t) = ( E E E A #W r \ ^)) 1<Kd + (^n,i(*) V°(*)) 1: 



r=l i=l j/=1 

In fact, it is possible to transfer observations from our original experiment £q 
to observations of (A.2) by adding N (o, C — C 1 ) -noise, where C : L 2 — >• L 2 
is the covariance operator of Y . Now, consider the covariance operator 

c,j{t) - [ {C A(s) da ) siu) «" + (4w 9< " (() , 1£l£d 

associated with the continuous-time experiment £%. 

We can bound C -1 / 2 on L 2 ([0, 1], R d ) from below (by partial ordering of 
operators) by a simple matrix multiplication operator: 

C~ 1/2 < M diag ( Hnil (t)) r 

Denote the Hilbert-Schmidt norm by || • ||hs- The asymptotic equivalence 
of observing Y and Y in E\ is ensured by the Hellinger distance bound 



H 2 (C (Y) , C (Y)) < 2 IIC- 1 / 2 (C - C) C-^Wl 



HS 
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,1 -i / d d 



< 2 / / EE^(*)"^, 



■ /A' 



ni n r 2 \ 

E E A ^(WW - a Ir (t A s)) dtds 

1=1 !/=l / 

1 /■! / d (i 



= 2 / MEET* 

(EE a ir(<i° A 4 r) )W«)W*) - Oir^f'W A F" 1 ^))) dudz 
i=l v=\ ) 



1=1 r=l 



The estimate for the L 2 -distance between the function (t, s) \-¥ A(F l l (t) A 
F~ 1 (s)), (l,r) e {1, . . . ,d} 2 , and its coordinate-wise linear interpolation by 
C( r Vin /3 Vn mi{ 2 ) relies on a standard approximation result on a rectangular 
grid of maximal width (n m j n ) _1 based on the fact that this function lies 
in the Sobolev class H 1+ p([0, l] 2 ) with corresponding norm bounded by 
2R 4 . This follows immediately by the product rule from A' = S € H 13 
and (Ff 1 ) / G C^, together with an L -error bound at the skewed diagonal 
{(t,s):F l (t) = F r (s)}. 

Next, we explicitly show that £\ is at least as informative as £q. To this 
end we discretise in each component on the intervals 7"^ = — ^ + 
2^] n [0, 1] for i = 0, . . . , nj. Define 

= T7 L r / , *T(t) dtf = / x * + 4° 

\M,i\ Jf-\i^) \h,i\ Jf-\i u ) 

= jhf T 4 ] - Hu) du + ef\ (A.3) 

for < i < n\ with i. i. d. random variables: 

ef = t}-.( miti/m) 1 ' 2 dW t il) ^ N (0, V f) . (A.4) 

\Ii,i\ Jf-\i u ) 

The covariances are calculated as 

E[(Y>fXYl)V]= ) /" / a lr (F i - 1 ( lt )AF r - 1 K))^' + ^^,,i 
We obtain for the squared Hellinger distance between the laws of observation 

H 2 (C ((^ ) );=l,...,d;j=0,...,n i ) , £ (((^/) (/) )«=l,...,d;i=0,...,n ; )) 
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d ni ri r I r r 

<E^V 2 EErr4^/ / MFr\u) a f-\ u ')) 

l,r=l i=0v=0 ^hlW 1 ^] J h,i JIv,t 

s 2 

— ai,.{F l (i/ni A v/n r )) dudu) . 



Write Af r (u,u') = ai r (Ff 1 (u) A F~ l {u')) and note Af r G tf 1+/3 ([0, l] 2 ) due 
to A' = S eH? and Ff G . For (i, i/) C := {(0,0), (0,n r ), (n,,0), 

(n;,n r )} the rectangle Jjj x 7„ )r is symmetric around (i/ni,v/n r ) such that 
the integral in the preceding display equals (V denotes the gradient) 



(A + tf („ - X) , £ + («' - £)) , («- X - £)) 



Using Jensen's inequality we thus obtain further the bound for the squared 
Hellinger distance: 



E ^ 2 Vr 2 tt { Tnr r K I twvAUiJm + ^u-i/ml 

,r=l i=0u=0 l^MlWl Jli,iXl v , r J0 

v/n r + ${u - v/nr)) ~ ^Af r {i/ni,u/n r )t{(i,u) <£ C)\\ 2 dti dudu' 



tf^J^^O {R\ ni A n r )-^) = O (± ni / V fj 



2 

-2-2/3 



where the order estimate is due to ||Vj4£||#,s < R 2 and a standard L 2 - 
approximation result for Sobolev spaces, observing that for the four corner 
rectangles in C the boundedness of the respective integrals only adds the 
total order 4n" 2 n < nin r ri^~ 2 P . □ 

APPENDIX B: ASYMPTOTICS IN THE BLOCK- WISE CONSTANT 

EXPERIMENT 

Proof of Theorem 3.2. As we have seen, the estimator is unbiased in 
£2 ■ For the covariance structure we use the independence between blocks 
and frequencies and the commutativity with Z to infer 

br x -\ 00 

COV f2 (iy 2 LMM^)=iy 2 h 2 J2W jk COV e2 (vec(S jk Sj k ))wJ k I 1 n /2 

k=0 j=l 

= i)l 2h Y J \ 2 r k ^ 2 z = z. (B.i) 

k=0 
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Since the local Fisher-type informations are strictly positive definite and thus 
invertible by Assumption 2.2(m), the multivariate CLT (3.10) for the oracle 
estimator follows by applying a standard CLT for triangular schemes as 
Theorem 4. 12 from [14]. The Lindeberg condition is implied by the stronger 
Lyapunov condition which is easily verified here by bounding moments of 
order 4. 

In Appendix C below we prove that in experiment E\ the estimator 
LMmI™' has an additional bias of order ©(n,^ 2 ) + Op(h) and a differ- 
ence in the covariance of order OQvnJ*^) + Op(h 2 ) under our Assumption 
2.2(«i-a),(ra-£), which by Slutsky's lemma yields an asymptotically negli- 
gible term compared to the best attainable rate (in any entry) n m Jx , cf. 
Theorem 4.3. □ 

Proof of Corollary 3.3. An important property of our oracle esti- 
mator is its equivariance with respect to invertible linear transformations 
A k on each block k in the sense that for observed statistics Sj k '■= A k Sj k ~ 
N(0, Cjk) under £ 2 we obtain (A~ T := (A T y 1 for short) 

C jk = A^C jk A k T , I jk = {A k ®A k ) T I jk (A k ®A k ), I k = {A k ®A k ) T I k {A k ®A k ) 

and hence 

LMM&>= Y, HA k ®A k )~ 1 Ij; l J2ljk( A k®A k )vec(S jk Sj k ) 

k=0 j>0 

= J2 {A k ® A k )-\hi k l YhkvecCS 3k ~S] k )). 

k=0 j>0 

For the covariance we use commutativity with Z and obtain likewise 

COV £2 (LMMW)= h 2 {A k ®A k )- l i k \A k ®A k )-^Z. (B.2) 

k=0 

We use this property to diagonalise the problem on each block. In terms 
of the noise level matrix H k := diag(iJ^ n )^ = x r ..,d) let O k be an orthogonal 
matrix such that 

A kh = O k Hj}V kh Hk X Ol (B.3) 

is diagonal. Note that A kh grows with n, but we drop the dependence on n 
in the notation for all matrices A kh , O k and Ti k . Use A k = O k 1-L k l to obtain 
the spectral statistics (3.2) transformed: 

Sj k = O k H~ k 1 Sj k ~ N (O, Cj k ) independent for all (j, k) 
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which yields a simple-structured diagonal covariance matrix: 

c jk = o k n k x c jk n k x ol = A kh + ^E d . 

A key point is that the covariance structure (3.7) in M d,2>cd2 is for indepen- 
dent components Sj k also diagonal, up to symmetry in the volatility matrix 
entries. Summing Ij k over j is explicitly solvable and gives for p, q = 1, . . . , d 

-l 



{h^%, q = (h- l Y.(c- k l ®c- k l ) p 

oo 

= \ h l^y A PP +7F 3 h ) { A gq +n j h ) 



/Agcc^ygg) - ^/AgcotMfe^/Af ) 
2 v ^gXf(A$-A#) 



2hk^ K kh 



-i 



= 2(AJJ V / Af + A^^)(l + 0(e- 2ft v^lf + h ~\A% A Ajf)-V*)) , 

using A kh > (min Zj j n;F/(t)?7^ 2 )S > n min E d , h 2 n min -)• oo and coth(x) = 
1 + 0(e _2:r ) for x — > oo. We thus obtain uniformly over k 



hi- 1 = (2 + o(l))(A fc/i <g> VX^ + \/A^ (8) A fc/i ). 
By formula (B.2) we infer in terms of (S^f) 1 / 2 := HkCH^E^H^f^Hk 

COV fa (LMMW) = {2 + o{l)) H^ kh ® {^n) 1/2 + (^n) 1/2 0^ kh )Z. 

k=0 

1/2 

The final step consists in combining n^^H^iit) — > Hi(t) uniformly in t 
together with a Riemann sum approximation to conclude 

n lim n^COV f2 (LMMW) 

' '"mm ?( - XJ 

= 2( J (e <g> {u{u- l Y>u- l ) 1/2 u) + (u{H- x VHr 1 ) l/2 H) ® s) (t) dt)^r. 



□ 



APPENDIX C: PROOFS FOR CONTINUOUS MODELS 



C.l. Weight matrix estimates. We shall often need general norm 
bounds on the weight matrices Wj k - 
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Lemma C.l. The oracle weight matrices satisfy uniformly over (j,k) 
and the matrices T, kh with ||£ fc/l ||oc + IKE**) -1 !!* % 1 

\\w 3k \\<h Q \i+f/ht)-\ 

Proof. From the proof of Corollary 3.3 we infer 
W jk = (H k Ol H k O T k )W jk (O k H k l ® (O k H k v ) with 



W jk = (2 + o(l))/ l - 1 ((A fcfe C- fc 1 ) ® (^Cj, 1 ) + (VX^ 1 ) ® (A kh C^)). 
We evaluate one factor in Wj k using 

\\H k Olk kh C^O k H k l \\ = WE^E^Wfh^HD^W < (i+^-^j-i. 



By \\A®B\\ < Pllll^ll and VX^C^ 1 = (A kh Cr k 1 )(A kh )~ 1 / 2 (the matrices 
are diagonal), we infer 

ll^ll < hT\l +j 2 hu 2 y 2 \\H k Oj(A kh )- 1 / 2 O k H^\\. 

To evaluate the last norm, despite matrix multiplication is non-commutative, 
we note 

(Oj {k kh )- l 20 k H k l ) T Ol{A kh )~lo k H k l = H^Olik^OkH^ = (Z kh )~ 



whence by polar decomposition \Oj (A kh ) l / 2 O k H k 1 \ = (E fch ) l / 2 implies 



Ol{K kh )-\o k H k 1 \\ = \\(T kh )- 1 2\\<l. 



k 

— 1/2 

Together with \\H k \\ < n ■ this yields 



\W jk \\ Zh-^l+jX 2 )-^, , 



- 2 r 2 n- 1/2 

which gives the result. □ 



Moreover, for the adaptive estimator we have to control the dependence 
of the weight matrices Wj k = Wj{T l kh ) on T, kh . We use the notion of matrix 
differentiation as introduced in [9] : define the derivative dA /dB of a matrix- 
valued function A(B) G R ox p with respect to B e R" xr as the R°W matrix 
with row vectors (d/dB a t,)vec(A), 1 < a < q, 1 <b < r. 

Lemma C.2. For the derivatives of the oracle weight matrices Wj(Yl kh ) ) 
assuming HE^Hoo + || (T, kh )^ 1 < 1, we have uniformly over (j,k): 



d 



dT, kh 



k h~ 



< V(i+jVr- (c- 1 ) 
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Proof. Since the notion of matrix derivatives relies on vectorisation, the 



identities vec(I k Ij k ) = {E d 2 <g> I k )vec(Ij k ) 
to the matrix differentiation product rule 



Wk ® E d 2)vec(I k ) give rise 



d 



dT, kh 



W. 



+ E, 



d 2 



Applying the mixed product rule (A®B)(C ®D) = [AC ®BD) repeatedly, 
and the differentiation product rule and chain rule to Ij k = Cj k (g) C, 
obtain 



jk ' 



we 



d 



dCjk 



C 



C jk 



d 2 , 



'Cj k l ® <#) ® (C^ ® C-/)) (((Cfr ® Ed® E, 

+{E d 2 ®E d ® C jk )){E d ® C d)d ® E d )({vec{E d ) ® E d 2) + {E d 2® vec{E d )))), 

with the so-called commutation matrix C d)d = Z — E d 2. By orthogonality 
of the last factors in both addends, ||A® .B|| = ||A||||S||, and the mixed 
product rule, we infer for the norm of the second addend in (C.2) 



(E d2 ® K 



dljk 
d^ kh 



<2\\{E d ®Cj k 1 )®{l k l {Cj k l ®Cj k 1 )) 



By virtue of 



2 \\W jk \\ \\c£ 



dh. 



< WW,. 



jk\ 



dlZ 



i I k 1 ®E d 2)j^ K = -{E d 2®I k ) d ^ kh 
it follows with the mixed product rule that 

dI k l /dY, kh = -{I k l ® I k l ){dl k /d^ kh ) 
This yields for the norm of the first addend in (C.2) 



{L k ®E (P ) 



dl, 



-i 



dY, kh 



dh 



dY, kh 



< \\w jk \ 

< \\w Jk \ 



dlfk 



dY, kh 



j'k\ 



< 



\\W jk \\ 



since we can differentiate inside the sum by the absolute convergence of 



\\Wj' k \\. This proves our claim by Lemma C.l. 



□ 
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C.2. Bias bound. Using the formula 1 — 2sin 2 (:c) = cos(2x) and Ito 
isometry, the (dxd)-matrix of (negative) biases (in the signal) of the addends 
in (3.8) as an estimator of T, kh in experiment £\ is given by 

Ak+l)h 

B jk :=2h~ 1 Y l (t)cos(2jirh~ 1 (t - kh))dt, 

Jkh 

which has the structure of a j-th Fourier cosine coefficient. We introduce 
the corresponding weighting function in the time domain: 

oo 

G k {u) =2Y / W jk cos{2jiru) G , u G [0, 1]. 

3=1 

Parseval's identity then shows for the d 2 -dimensional block-wise bias vector 
of (3.8): 

r(k+l)h 

h^Gkih-^t - kh))vec(Z{t)) dt. 



Y W jk vec(B j}k ) 

3=1 



kit 



The vector of total biases of (3.8) is then the linear functional of X: 

— 1 oo „\ 

V hJ2W jk vec(B jk ) = / G h (t)vec{H(t))dt 
k=o j=i J ° 

where for t G [kh, (k + l)h)) 

oo 

G h (t) = G k (h- l (t-kh)) = 2Y,W jk cos(2ir jh-H) . 

3=1 

For £ in the Besov space .Bf ^([0, 1]), < a < 1, the L 1 -modulus of 
continuity satisfies a; i iQ .i])(S) S) < ||£||_b« <P, see e.g. [7, Section 3.2]. We 
have for 5 G (0, 1) and s G [0,1-5] 
S 

uec(E(i + a)) cos(^) dt 
r s 

cos(^±) dt < sup / \vec(E(t + s)-T,(t + v + s))\dt<u) L in ss+S]) (E,5). 

0<v<8J0 ' 

This shows for the total bias in estimation of the volatility in X by the 
bound on in Lemma C.l 

h- 1 -! oo 



1 


fj 


~ 5 


Jo Jo 







G h (t)vec(L(t))dt 



< 2 Y Y^Wj^Lmkh^k+m^ih/j) 

k=0 0=1 

oo 

< Y h \l + (h /j) 4 y\h/j) a ^ (h/h T = n-^i 2 . 

3=1 
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We thus have a bias of order 0(n m ",{ 2 ). Remark that it is quite surprising 
that this bias bound is independent of h, which is also at the heart of the 
quasi- maximum likelihood method [1]. 

If wec(S) is a (vector-valued) square-integrable martingale, then we use 
that martingale differences are uncorrelated and write for the total bias 

C G h (t)vec(Z(t))dt = ( G h {t)vec{Y,{t) - Eflfc -1 *]/*)) dt, 
Jo Jo 

using J Gk = 0. This expression is centred with covariance matrix 

V / G fc (/i _1 (t - kh))E[vec(Z(t) - Y,(kh))vec(Y,(s) - Y,(kh)) T ] 

£r* J[kh,(k+i)h] 2 

Gkih^is-kh^dtds. 

The expected value in the display is smaller than (in matrix ordering) 
E[vec(T l ((k+l)h)-T l (kh))vec(J:((k+l)h)-T l (kh)) T }. Because of HCy^ < 1 
the covariance matrix (in any norm) is of order 0(/i 2 E[||£(l) — £(0)|| 2 ]) = 
0(h 2 ). 

If E = Ti B + S M is the sum of a function S B in Bf ^([0, 1]) and a square- 
integrable martingale S M , then the preceding estimations apply for each 
summand and the total bias has maximal order 0(^ m ",{ 2 ) + Op{h). 

C.3. Variance for general continuous-time model. The covariance 
for the estimator under model E\ can be calculated as under model £2, but 
we lose independence between different frequencies j,j' on the same block. 
For that we use the formula for Gaussian random vectors A, B 

COV{vec{AA T ), vec(BB T )) = (COV(B, B) ® <COV(A, B) + COY {A, A)® 
COV(A B) + COV(A B) ® COV(A A) + COV(A, B) ® COV(B, B))Z/A, 

obtained by polarisation. This implies 

||COV £l (LMMW) - C0V £2 (LMMW)|| 

< £ h 2 £ ||^fe||||^-fe(COV £l (5 3 - fc ,5 ife )(g)COV£ 1 (5 jA! ,5 /fc ))||. 

k=0 j,j'=l 
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From Lemma C.l and ||^4<g>.B|| < ||A||||B|| for matrices A, B we infer that 
the series over j, f is bounded in order by 



oo ! 

£ h 2 (l+j'/h )-\l+j/h )- 2 ( / (E-E 

,7' = 1 ° 



^'fellL 2 ll^j'A:llL 2 



_|_ /" J;„»/ir2 

•/o 



The identities 2 cos(a) cos(6) = cos(a + b) + cos(a — 6), 2 sin(a) sin(6) = 
cos(a — b) — cos(a + 6) and the same bound as in Section C.2 imply for 
E, (FJ)- 1 , . . . , (fjT 1 € Bf_([0, 1]) (note that even (F/)- 1 6 C°([0, 1])) 



< + /i ( 1 - <J i,i') 



x l|E|| B f ([^(fe+i)^]) 



and similarly the bound 

. ,/ /i — 5 7 - 7 /)\ « . 2 - i 

^ J JJ ^ m f X IK^) llB fi00 ([fch,(fc+l)h]) 

for the norm over H 2 l . Putting all estimates together gives 
||COV £l (LMMW) - COV f2 (LMMW)|| 

oo 

<fc £ V 2 (i + //M" 4 (i + i/M" 2 ^(i + U-/l)"°(i + iiV)- 

By comparison with the double integral (in terms of x ~ j/ho, y ~ f /ho) 

/•oo /*oo 

/ / (l + y)- 4 (l + x)- 2 |x-2/l~ a (l + ^)^y < 1 
Jo Jo 

we conclude 

||COV ei (LMM&>) - COV £2 (LMMW)|| < /m"^ 2 . 

Arguing exactly as in Section C.2 for the case of £ being a sum of a Bf^- 
function and an L 2 -martingale, the difference of covariances is in general of 
order 0(hn-«l 2 ) + P (h 2 ). 



EFFICIENT QUADRATIC COVARIATION MATRIX ESTIMATION 31 



C.4. Proof of Theorem 3.4. Let us denote the rate of convergence of 
E by S n = ^ m "^ 4a+2 ^- For later use we note the order bounds 

$n = o{v I /1q ^ {nmin/ n max) ^ )> $n = ^(j^O ( n min/ n max) ^ )• (C.3) 

First, we show that 



(C.4) 



which by Slutsky's Lemma implies the CLT with normalisation matrix I n . 
This in turn is already sufficient for obtaining the result of Corollary 3.3 for 

(n) 

LMM;7 . Let us start with proving that 



rpm ._ 

n •" 



r-i-l (m+l)r/h-l oo 

E h E Y.{ w ^ mr )-w 3 {^ mr ))z 3k =o P ( n -y*) 

m=0 k=mr/h j=l 



where the random variables 

Z jk = vec (s jk Sj k - 7r 2 j 2 h- 2 diag ((H^) 2 )^ - E 

are independent, ¥.£ 2 [Zjk] = 0, COV,f 2 (Zjk) = IJk^- We have 



kh 



T — 1 OO 



T n m < E ^Epi^)-^^ 

m=0 j=l 



(m+l)r/h-l 

E 

k=mr/h 



(C.5) 



since the weight matrices do not depend on k on the same block of the coarse 
grid. Using Lemma C.2 and that ||E — S|| L i = Op(6 n ), we obtain 



Wj(t mr ) - W j {T 1 mr ) 



< max 
k 



dWj(j: kh ) 



dE kh 



Op ((/i 1 A /iqJ 4 ) r - S||£,x([ mr) ( m+1 ) r ])) 



For the second factor in (C.5) we employ ||COVf 2 (Z J 7 c )|| = 2||Cjfc|| 2 such 
that \\Zjk\\ = Op(\\Cjk\\)- Consequently, (C.3) implies for T™ the bound 



r — 1 oo 



J2 hJ2 Op {(K 1 A hlry-'Wt - E|| Ll([w>(m+1)r]) (r/fc) 1/2 (l V j 2 h 

m=0 j=l 

= O p (r-y*h 1 / 2 6 n )=Op(n-t). 
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The asymptotics (C.4) follow if we can ensure that the coarse grid ap- 
proximations of the weights induce a negligible error, i.e. if also 



r -i_i (m+l)r/ft-l oo 
E E # E {W^ kh ) - W i (E raP )) % = o P (n- m \ 

m=0 k=mr/h 3=1 



M 

max I 



holds. The term is centred and its covariance matrix is bounded in norm by 



r _1 -l (m+l)r/h-l oo 

E E h 2 Y,\\w^ kh )-w^ m n 

m=0 k=mr/h 3=1 



jk II- 



From Lemma C.2, = 2\\C jk \\ 2 < 1 + j 4 h^ 4 and S 6 B? jOO ([0, 1]) we 

derive the upper bound 

oo 

o( E ^E-V(i+Ao 4 )- 1 ) = °(<^ 2a )=°(^) 

fc=0 j=l 

by the choice of r and a > 1/2. 

Another application of Slutsky's Lemma yields the CLT with normalisa- 

l/2~ — 1/2 

tion matrix I n provided I n In — >■ -E^a in probability. The proof of Lemma 
C.2, more specifically the bound on the last term in (C.2), yields also 

d 



dTi kh J*. > 



<^(i + jV) 



4j,-4\-1 



This implies J2k,j\\Ijk ~ IjkW = P (h- l 5 n ). Using A' 1 - A' 1 = A~ l (A 
A)A~ l and \\I£ ^ < h^ 1 , we infer 



h — 1 oo i oo _^ 

n'-In'H E ^ (E4) -(E J ife) =O P (h5 n h 2 ) 
k=0 



3=1 



3=1 



The smallest eigenvalue of I^ 1 equals ||I n || 1 which has order at least 



-1/2 



. The global Lipschitz constant L n of f(x) = x 1 / 2 for x > ||I n || 1 



is 



therefore of order Umax- The perturbation result from [16] for functional 
calculus therefore implies 

HI^I- 1 / 2 - E d \\ < ^lll^mil-i _ = O P (nl/ 2 x h5nh 2 )- 

The order is (n max / 'n^n) 1 ^h^ 1 5 n and tends to zero by (C.3). 
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APPENDIX D: PROOF OF THE LOWER BOUND 

D.l. Proof of Lemma 4.1. Since M^ R ,^i/2T r is an isometry on 

L 2 ([0, l];R d ), we obtain directly for the adjoint T* = T^M^-i. We ob- 
serve in a formal differential notation: 

T;M {Riy/ 2 dY = T~ l M [RI) -^ (Xdt + j^dW) 

= -T- l I*{M {{RI) -y2 0y Xdt + M m -y, dX) + j^dW 
= -r^(M m -i / 2 0y Xdt + M (m -i / 2 dX) + ^dW. 

Here, we use that T*M., R ,y/ 2 Q is an L 2 -isometry and we introduce the 
independent Brownian motions W, B via the differentials 

dW = T;M (RI) i/2 dW, dB = T*M {R ,y/2 dB 
or alternatively (apply —I*) via their coordinates i = 1, . . . , d as 

Wi(u) = £ P R'^s^O^s) dW 3 (s), (D.l) 
i=i 7 ° 

and B~i(u) analogously. 

The formal derivations are made rigorous by duality, that is testing 
stochastic differentials with deterministic L 2 -functions. We infer from the 
coordinate- wise definition of W for / G L 2 ([0, l];R d ) (e.g., check via indica- 
tor functions /) 

f\o(t) T R'(t)^ 2 (T r f)(t),dW t ) = f\f(u),dW(u)) 
Jo Jo 

and equally for B. Now consider for functions g £ L 2 ([0, 1]; IR rf ) the real 
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observations 

\o(t) T Ii!(t) 1 '\T r g)(t),dYt) = l\o{tf Hit)- 1 ' 2 '(T r Ig)' \t),dY t ) 



o 



\{0{t) T B!(t)- l l\T r Ig))' - (0(t) T R'(t)-^ 2 y(T r Ig)(t),X t )dt 

4= f\o{t) T R!(t)y\T r g){t),dW t ) 
V n Jo 

\-{0{t) Y E'(t)- 1 l 2 )'{T r Ig){t),X t )dt 

1 {0{t) T R'(t)- l / 2 {T r Ig){t),dX t ) + -1= [ (g(u),dW u ) 

V n Jo 

(-(0(t) J E!(t)- l l 2 )'{T r Ig){t),X t ) dt 

/' 1 (E(t) 1 /2o(t) T i2'(t)- 1 /2(r r i 5 )(t) ) dB t ) + 4= f\g(u),dw u ). 

Jo V n 



For e = we use (i?') 1 / 2 A(i? / ) x / 2 = A and evaluate the first two terms of 
the last display as 

\-(0(t) T Il!{t)- 1 ' 2 ) , (T r Ig)(t),X t )dt- ( ((T~ l K{T r Ig))(u),dB u ). 



As A is constant in time, the second term is equal to — ^(Ig, AdB) and the 
formal derivations above are confirmed. 

D.2. Proof of Lemma 4.2. In a first step note that for general oper- 
ators A, B we have 

\\AA* - BB* f HS = l\\(2A + B-A){A- B)* + (2A + B- A)* (A - B)\\ HS 
<2\\A\\\\A-B\\ HS + \\A-Bf HS . 

Hence, it suffices to show 

WQnTQlSw S 1 and WQnfQlS - c-^c^Us < i. 

A further reduction is achieved by splitting terms to obtain 
\\Qnl /2 Qn, 2 l ~ C-^C^Whs < Hld-C-^Q^IIHSIIQ^Q^II 



+ ii^",o /2 <o 2 ii WQnTQnlw n id -QnT cl £\ 



HS- 
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Owing to WC'^Ql/h < 1 + \\ld-C~l /2 Q HoWhs it remains to show 

WQnTQnJW S 1, WU-C-^Q^Whs < 1 and wid-Q-^c^Us < i. 

Finally, we can use Q n ,i - Qn,o = Qoo,i - Qoo,o, Qn,i > Qoo,i in operator 
order (and similarly for C nj£ ) as well as \a — 1| < \a 2 — 1| for a > implying 
\\A — Id\\ hs < || AA* — Id | Iks for positive operators A. We are thus left with 
proving that the following three quantities are uniformly bounded 

|i n -l/2 n l/2|| n r -l/2 (n r N^-l/2,, , |0 -l/2 rrl l/2 n l/2v n -l/2,i 

By the Feldman-Hajek Theorem for Gaussian measures, see e.g. [8], the lat- 
ter two quantities are finite iff the Gaussian laws N(0, Coo, £ ) and N(0, Qoo,e), 
are equivalent for e £ {0, 1}. Using again differential notation, these are the 
laws of 

Z c := T?M {RI) i/2 X, Z Q := -I*T*M {RI) -i/2 dX 

where dX = Tr'^dB for the e at hand. Both processes are images in 
C([0, 1], Mr) under the linear (and thus measurable) map T~ l = T*A4 R > 
of the respective processes 

Z c := M {RI) -i/2 X, Z Q := -I*M m -i,2 dX. 

By the product rule we see 

Z C (t) = -I*{M (RI) -i/2 dX + M {{RI) -i/* yX}(t) 

= Z Q (t)+ [\(R')- 1/2 Oy(s)X(s)ds. 
Jo 

Hence, Z c equals the Brownian martingale plus an adapted linear drift 
in X. By Girsanov's theorem, noting that all deterministic quantities are 
continuous and bounded away from zero, the laws of Z c and Z® are equiv- 
alent, e.g. use Thm. 3.5.1. together with Cor. 3.5.16 in [15]. Hence, so are 
the laws of their images Z c and Z®, as required. 

Let us finally consider Q n oQni- Its squared norm equals 

(Qn,lf,f) _ (M {Rf) -^ 0J:i0 T iR r rl/2 T r If,T r If) + I||/|| 2 

£5 (Qnflf, f) /e£ l|//H 2 + £ll/H 2 

< (1 + ||Af Hoc) max IKr- 1 )'!^. 

i=l,...,d 

This uniform bound is finite under our regularity assumptions. 
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D.3. Proof of Theorem 4.3. Without loss of generality we may as- 
sume that A(t) is symmetric for all t because £(t) is symmetric. Owing to 

1 /2 1/2 

Zvec{A) = 2vec{A) and (E ® Eq' )uec(A) = vec(£o' j4£ ), we thus have 
to show in terms of the Hilbert-Schmidt scalar product 

Var £=0 (tf n ) > (8+ ^_ (1)) /'((So ® sj /2 + £* /2 ® Z )A)(t), A(t)) HS dt. 
V n •'0 

Since Cbm is a positive operator on L 2 ([0, 1];R), we can define the 
bounded self-adjoint operator 

K = I(a 2 C BM + ^Id)- 1 /* = (^Id+^CSk)- 1 . 

C#m is Hilbert-Schmidt and so is A£. We identify its kernel 5° : [0, l] 2 -4 R 
(or Green function) as 

SZ(t,a) = - ^ r^ ( sinh(aVn(l - \s - t\)) + sixih(ay/n(t + s - 1))) . 

2a cosh(a^n) v ' 

This can be formally derived from the properties C^ M = —D 2 on its domain, 
5£ in the domain (i.e. 5°{0,s) = 0, (<$£)'(l,s) = 0) and 5£(t,s) = A££ s (t). 
Alternatively use the eigenvalue-eigenfunction decomposition of C^Af and 
apply functional calculus. The main observation is that 5% has all the prop- 
erties of a smoothing kernel, which for n — > oo concentrates on the diagonal 
{t = s}, where it approximates the uniform law. This is best seen by the 
approximation for large n 

6°(t, s )~^( exp(-£T V / n|s - 1\) + sgn(i + s - 1) exp(oyn(|i + a - 1| - 1))) , 

observing \t 4- s — 1| — 1 < |s — t\ 4 |2i — 1| — 1 such that the second 
exponential asymptotically only contributes at the corners (0,0) and (1, 1) 
of the diagonal. 

We shall see, however, that for the Hilbert-Schmidt norm evaluation we 
face {5n) 2 as the operator kernel, which also behaves like a smoothing kernel 
on the diagonal, but needs to be rescaled by ||#£||?2 = (1 + o(l))- v /n/(4<T 3 ). 

Consequently, in terms of A n = diag(A^")j = i r .. )( 2 and its kernel 5 n (t, s) = 
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diag((5^ <i (t, s))i=i,...,d, the Fisher information evaluates as 

1 



= - trace(T r IQ~ qI*T* M MT r IQ~fiI*T* M m) 
= ^trace((T r A n T;)7W M (T r A n r;)A^ M ) 

-1 /■! 



= J/ / trace ]Rd xd(<5 n (r(t),r(s))M(s)(5 n (r(s),r(i))M(t))^ds. 
i Jo Jo 

We now use Jq M{s)a n e~ ],t ~ s],an ds = 2M(t)(l + o(l)) uniformly over t G 
[6 n , 1 — 6 n ] whenever a n — > oo, a n b n — > oo and M(t) is continuously differ- 
entiable. Together with the asymptotic behaviour of 5° we obtain 

f 1 Stir^nis^M^S^ir^s), r 3 {t)) ds 
Jo 

n f l 

= ( 1 + ^I7CJC- / exp(-^(A^(t) + A„r;.(t))|t- S |)M y -( S )d S 

\fn 

- (1+0(1)) 2A M A, j( A„ r ;( t ) + A^( t )) ^ W 
nMy(i)(l + o(l)) 



2A ii Aj :J -(Ai(t) + Xj(t)) 
with o(l) uniformly in n and i G [n~ p , 1 — n~ p ] for any p £ (0, 1/2) to infer 



^(1 + 0(1)) r 1 * (Qho t )?. 

/o Aj(Aj + A-,)Aj 

Asymptotically for n — > oo neglecting terms of smaller order, this bound 
is obtained by the worst parametric perturbation H*(i) = Y,oAY}J 2 + 

1/2 

S ^Soj which we evaluate using duality with respect to the scalar product 
SoEijAijWBijWdtas 

UoEi J= iMm^t)dt) 2 Uo l T.i J= iMtm {t)dtf 



Jo 
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Finally, remark that the Cramer- Rao inequality, e.g. [18, Thm. 2.5.10], is 
applicable since (N(0, Q n ,s))e forms an exponential family in (Q~\) e , which 
is differentiable at e = 0, and thus the models (N(0, Q n ,e))e as well as 
(N(0, C n>£ )) £ are regular. 
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